Load the necessary libraries
library(rstanarm) # for fitting models in STAN
library(brms) # for fitting models in STAN
library(coda) # for diagnostics
library(bayesplot) # for diagnostics
library(ggmcmc) # for diagnostics
library(rstan) # for interfacing with STAN
library(DHARMa) # for residual diagnostics
library(emmeans) # for marginal means etc
library(broom) # for tidying outputs
library(broom.mixed) # for tidying MCMC outputs
library(tidybayes) # for more tidying outputs
library(ggeffects) # for partial plots
library(tidyverse) # for data wrangling etc
library(patchwork) # for multiple figures
source("helperFunctions.R")
An ecologist studying a rocky shore at Phillip Island, in southeastern Australia, was interested in how clumps of intertidal mussels are maintained (Quinn 1988). In particular, he wanted to know how densities of adult mussels affected recruitment of young individuals from the plankton. As with most marine invertebrates, recruitment is highly patchy in time, so he expected to find seasonal variation, and the interaction between season and density - whether effects of adult mussel density vary across seasons - was the aspect of most interest.
The data were collected from four seasons, and with two densities of adult mussels. The experiment consisted of clumps of adult mussels attached to the rocks. These clumps were then brought back to the laboratory, and the number of baby mussels recorded. There were 3-6 replicate clumps for each density and season combination.
Format of quinn.csv data files
| SEASON | DENSITY | RECRUITS | SQRTRECRUITS | GROUP |
|---|---|---|---|---|
| Spring | Low | 15 | 3.87 | SpringLow |
| .. | .. | .. | .. | .. |
| Spring | High | 11 | 3.32 | SpringHigh |
| .. | .. | .. | .. | .. |
| Summer | Low | 21 | 4.58 | SummerLow |
| .. | .. | .. | .. | .. |
| Summer | High | 34 | 5.83 | SummerHigh |
| .. | .. | .. | .. | .. |
| Autumn | Low | 14 | 3.74 | AutumnLow |
| .. | .. | .. | .. | .. |
| SEASON | Categorical listing of Season in which mussel clumps were collected independent variable |
| DENSITY | Categorical listing of the density of mussels within mussel clump independent variable |
| RECRUITS | The number of mussel recruits response variable |
| SQRTRECRUITS | Square root transformation of RECRUITS - needed to meet the test assumptions |
| GROUPS | Categorical listing of Season/Density combinations - used for checking ANOVA assumptions |
Mussel
quinn <- read_csv("../public/data/quinn.csv", trim_ws = TRUE)
quinn %>% glimpse()
## Rows: 42
## Columns: 5
## $ SEASON <chr> "Spring", "Spring", "Spring", "Spring", "Spring", "Spring…
## $ DENSITY <chr> "Low", "Low", "Low", "Low", "Low", "High", "High", "High"…
## $ RECRUITS <dbl> 15, 10, 13, 13, 5, 11, 10, 15, 10, 13, 1, 21, 31, 21, 18,…
## $ SQRTRECRUITS <dbl> 3.872983, 3.162278, 3.605551, 3.605551, 2.236068, 3.31662…
## $ GROUP <chr> "SpringLow", "SpringLow", "SpringLow", "SpringLow", "Spri…
quinn %>% summary()
## SEASON DENSITY RECRUITS SQRTRECRUITS
## Length:42 Length:42 Min. : 0.00 Min. :0.000
## Class :character Class :character 1st Qu.: 9.25 1st Qu.:3.041
## Mode :character Mode :character Median :13.50 Median :3.674
## Mean :18.33 Mean :3.871
## 3rd Qu.:21.75 3rd Qu.:4.663
## Max. :69.00 Max. :8.307
## GROUP
## Length:42
## Class :character
## Mode :character
##
##
##
quinn <- quinn %>%
mutate(
SEASON = factor(SEASON,
levels = c("Spring", "Summer", "Autumn", "Winter")
),
DENSITY = factor(DENSITY)
)
Model formula: \[ \begin{align} y_i &\sim{} \mathcal{NB}(\lambda_i, \theta)\\ ln(\mu_i) &= \boldsymbol{\beta} \bf{X_i}\\ \beta_0 &\sim{} \mathcal{N}(0,10)\\ \beta_{1,2,3} &\sim{} \mathcal{N}(0,2.5)\\ \theta &\sim{} \mathcal{Exp}(1) \end{align} \]
where \(\boldsymbol{\beta}\) is a vector of effects parameters and \(\bf{X}\) is a model matrix representing the intercept and effects of season, density and their interaction on mussel recruitment.
The exploratory data analyses that we performed in the frequentist instalment of this example are equally valid here. That is, boxplots and/or violin plots for each population (substrate type).
quinn %>% head()
quinn %>% ggplot(aes(y = RECRUITS, x = SEASON, fill = DENSITY)) +
geom_boxplot()
Conclusions:
In rstanarm, the default priors are designed to be
weakly informative. They are chosen to provide moderate regularisation
(to help prevent over-fitting) and help stabilise the computations.
quinn.rstanarmP <- stan_glm(RECRUITS ~ SEASON * DENSITY,
data = quinn,
family = poisson(link = "log"),
refresh = 0,
chains = 3, iter = 5000, thin = 5, warmup = 2000
)
quinn.rstanarmP %>% prior_summary()
## Priors for model '.'
## ------
## Intercept (after predictors centered)
## ~ normal(location = 0, scale = 2.5)
##
## Coefficients
## Specified prior:
## ~ normal(location = [0,0,0,...], scale = [2.5,2.5,2.5,...])
## Adjusted prior:
## ~ normal(location = [0,0,0,...], scale = [5.47,5.80,6.02,...])
## ------
## See help('prior_summary.stanreg') for more details
This tells us:
for the intercept, when the family is Poisson, it is using a normal prior with a mean of 0 and a standard deviation of 2.5. The 2.5 is used for all intercepts. It is often scaled, but only if it is larger than 2.5 is the scaled version kept.
for the coefficients (in this case, the individual effects), the default prior is a normal prior centred around 0 with a standard deviations of 5.47, 5/8, 6.02 etc. This is then adjusted for the scale of the data by dividing the 2.5 by the standard deviation of the numerical dummy variables for the predictor (then rounded).
2.5 / apply(model.matrix(~ SEASON * DENSITY, quinn)[, -1], 2, sd)
## SEASONSummer SEASONAutumn SEASONWinter
## 5.467708 5.799380 6.019749
## DENSITYLow SEASONSummer:DENSITYLow SEASONAutumn:DENSITYLow
## 4.991312 7.058781 8.414625
## SEASONWinter:DENSITYLow
## 9.590995
quinn.rstanarm1 <- stan_glm(RECRUITS ~ SEASON * DENSITY,
data = quinn,
family = poisson(link = "log"),
prior_PD = TRUE,
refresh = 0,
chains = 3, iter = 5000, thin = 5, warmup = 2000
)
ggpredict(quinn.rstanarm1, ~ SEASON + DENSITY) %>% plot()
ggemmeans(quinn.rstanarm1, ~ SEASON + DENSITY) %>%
plot(add.data = TRUE)
Conclusions:
The following link provides some guidance about defining priors. [https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations]
When defining our own priors, we typically do not want them to be scaled.
If we wanted to define our own priors that were less vague, yet still not likely to bias the outcomes, we could try the following priors (mainly plucked out of thin air):
Remember the above are applied on the link scale.
I will also overlay the raw data for comparison.
quinn.rstanarm2 <- stan_glm(RECRUITS ~ SEASON * DENSITY,
data = quinn,
family = poisson(link = "log"),
prior_intercept = normal(2.3, 5, autoscale = FALSE),
prior = normal(0, 2, autoscale = FALSE),
prior_PD = TRUE,
refresh = 0,
chains = 3, iter = 5000, thin = 5, warmup = 2000
)
quinn.rstanarm2 %>%
ggpredict(~ SEASON + DENSITY) %>%
plot(add.data = TRUE)
Now lets refit, conditioning on the data.
quinn.rstanarm3 <- quinn.rstanarm2 %>% update(prior_PD = FALSE)
quinn.rstanarm3 %>% posterior_vs_prior(
color_by = "vs", group_by = TRUE,
facet_args = list(scales = "free_y")
)
Conclusions:
quinn.rstanarm3 %>%
ggpredict(~ SEASON + DENSITY) %>%
plot(add.data = TRUE)
quinn.rstanarm3 %>%
ggemmeans(~ SEASON + DENSITY) %>%
plot(add.data = TRUE)
In brms, the default priors are designed to be weakly
informative. They are chosen to provide moderate regularisation (to help
prevent over-fitting) and help stabilise the computations.
Unlike rstanarm, brms models must be
compiled before they start sampling. For most models, the compilation of
the stan code takes around 45 seconds.
quinn.form <- bf(RECRUITS ~ SEASON * DENSITY, family = poisson(link = "log"))
get_prior(quinn.form, data = quinn)
Remember that the priors are applied on the link (in this case, log) scale.
quinn %>%
group_by(SEASON, DENSITY) %>%
summarise(
Mean = mean(RECRUITS),
MAD = mad(RECRUITS)
) %>%
mutate(
log(Mean),
log(MAD)
)
priors <- prior(normal(2.3, 2), class = "Intercept") +
prior(normal(0, 1), class = "b")
quinn.brm2 <- brm(quinn.form,
data = quinn,
prior = priors,
sample_prior = "only",
refresh = 0,
chains = 3,
iter = 5000,
thin = 5,
warmup = 2000
)
quinn.brm2 %>%
ggpredict(~ SEASON + DENSITY) %>%
plot(add.data = TRUE)
quinn.brm2 %>%
ggpredict(~ SEASON + DENSITY) %>%
plot(add.data = TRUE) + scale_y_log10()
quinn.brm2 %>%
ggemmeans(~ SEASON + DENSITY) %>%
plot(add.data = TRUE)
quinn.brm2 %>%
ggemmeans(~ SEASON + DENSITY) %>%
plot(add.data = TRUE) + scale_y_log10()
quinn.brm2 %>%
conditional_effects("SEASON:DENSITY") %>%
plot(points = TRUE)
quinn.brm2 %>%
conditional_effects("SEASON:DENSITY") %>%
plot(points = TRUE) %>%
`[[`(1) + scale_y_log10()
quinn.brmP <- quinn.brm2 %>% update(sample_prior = "yes", refresh = 0)
quinn.brmP %>% get_variables()
## [1] "b_Intercept" "b_SEASONSummer"
## [3] "b_SEASONAutumn" "b_SEASONWinter"
## [5] "b_DENSITYLow" "b_SEASONSummer:DENSITYLow"
## [7] "b_SEASONAutumn:DENSITYLow" "b_SEASONWinter:DENSITYLow"
## [9] "prior_Intercept" "prior_b"
## [11] "lp__" "accept_stat__"
## [13] "stepsize__" "treedepth__"
## [15] "n_leapfrog__" "divergent__"
## [17] "energy__"
quinn.brmP %>%
hypothesis("SEASONSummer<0") %>%
plot()
quinn.brmP %>%
hypothesis("DENSITYLow<0") %>%
plot()
quinn.brmP %>%
hypothesis("SEASONSummer:DENSITYLow<0") %>%
plot()
quinn.brmP %>%
posterior_samples() %>%
dplyr::select(-`lp__`) %>%
pivot_longer(everything(), names_to = "key") %>%
mutate(
Type = ifelse(str_detect(key, "prior"), "Prior", "b"),
Class = ifelse(str_detect(key, "Intercept"), "Intercept",
ifelse(str_detect(key, "b"), "b", "sigma")
),
Par = str_replace(key, "b_", "")
) %>%
ggplot(aes(x = Type, y = value, color = Par)) +
stat_pointinterval(position = position_dodge()) +
facet_wrap(~Class, scales = "free")
quinn.brmP %>% standata()
## $N
## [1] 42
##
## $Y
## [1] 15 10 13 13 5 11 10 15 10 13 1 21 31 21 18 14 27 34 49 69 55 28 54 14 18
## [26] 20 21 4 22 30 36 13 13 8 0 0 10 1 5 9 4 5
##
## $K
## [1] 8
##
## $X
## Intercept SEASONSummer SEASONAutumn SEASONWinter DENSITYLow
## 1 1 0 0 0 1
## 2 1 0 0 0 1
## 3 1 0 0 0 1
## 4 1 0 0 0 1
## 5 1 0 0 0 1
## 6 1 0 0 0 0
## 7 1 0 0 0 0
## 8 1 0 0 0 0
## 9 1 0 0 0 0
## 10 1 0 0 0 0
## 11 1 0 0 0 0
## 12 1 1 0 0 1
## 13 1 1 0 0 1
## 14 1 1 0 0 1
## 15 1 1 0 0 1
## 16 1 1 0 0 1
## 17 1 1 0 0 1
## 18 1 1 0 0 0
## 19 1 1 0 0 0
## 20 1 1 0 0 0
## 21 1 1 0 0 0
## 22 1 1 0 0 0
## 23 1 1 0 0 0
## 24 1 0 1 0 1
## 25 1 0 1 0 1
## 26 1 0 1 0 1
## 27 1 0 1 0 1
## 28 1 0 1 0 0
## 29 1 0 1 0 0
## 30 1 0 1 0 0
## 31 1 0 1 0 0
## 32 1 0 1 0 0
## 33 1 0 1 0 0
## 34 1 0 0 1 1
## 35 1 0 0 1 1
## 36 1 0 0 1 1
## 37 1 0 0 1 0
## 38 1 0 0 1 0
## 39 1 0 0 1 0
## 40 1 0 0 1 0
## 41 1 0 0 1 0
## 42 1 0 0 1 0
## SEASONSummer:DENSITYLow SEASONAutumn:DENSITYLow SEASONWinter:DENSITYLow
## 1 0 0 0
## 2 0 0 0
## 3 0 0 0
## 4 0 0 0
## 5 0 0 0
## 6 0 0 0
## 7 0 0 0
## 8 0 0 0
## 9 0 0 0
## 10 0 0 0
## 11 0 0 0
## 12 1 0 0
## 13 1 0 0
## 14 1 0 0
## 15 1 0 0
## 16 1 0 0
## 17 1 0 0
## 18 0 0 0
## 19 0 0 0
## 20 0 0 0
## 21 0 0 0
## 22 0 0 0
## 23 0 0 0
## 24 0 1 0
## 25 0 1 0
## 26 0 1 0
## 27 0 1 0
## 28 0 0 0
## 29 0 0 0
## 30 0 0 0
## 31 0 0 0
## 32 0 0 0
## 33 0 0 0
## 34 0 0 1
## 35 0 0 1
## 36 0 0 1
## 37 0 0 0
## 38 0 0 0
## 39 0 0 0
## 40 0 0 0
## 41 0 0 0
## 42 0 0 0
## attr(,"assign")
## [1] 0 1 1 1 2 3 3 3
## attr(,"contrasts")
## attr(,"contrasts")$SEASON
## Summer Autumn Winter
## Spring 0 0 0
## Summer 1 0 0
## Autumn 0 1 0
## Winter 0 0 1
##
## attr(,"contrasts")$DENSITY
## Low
## High 0
## Low 1
##
##
## $prior_only
## [1] 0
##
## attr(,"class")
## [1] "standata" "list"
quinn.brmP %>% stancode()
## // generated with brms 2.16.3
## functions {
## }
## data {
## int<lower=1> N; // total number of observations
## int Y[N]; // response variable
## int<lower=1> K; // number of population-level effects
## matrix[N, K] X; // population-level design matrix
## int prior_only; // should the likelihood be ignored?
## }
## transformed data {
## int Kc = K - 1;
## matrix[N, Kc] Xc; // centered version of X without an intercept
## vector[Kc] means_X; // column means of X before centering
## for (i in 2:K) {
## means_X[i - 1] = mean(X[, i]);
## Xc[, i - 1] = X[, i] - means_X[i - 1];
## }
## }
## parameters {
## vector[Kc] b; // population-level effects
## real Intercept; // temporary intercept for centered predictors
## }
## transformed parameters {
## }
## model {
## // likelihood including constants
## if (!prior_only) {
## target += poisson_log_glm_lpmf(Y | Xc, Intercept, b);
## }
## // priors including constants
## target += normal_lpdf(b | 0, 1);
## target += normal_lpdf(Intercept | 2.3, 2);
## }
## generated quantities {
## // actual population-level intercept
## real b_Intercept = Intercept - dot_product(means_X, b);
## // additionally sample draws from priors
## real prior_b = normal_rng(0,1);
## real prior_Intercept = normal_rng(2.3,2);
## }
In addition to the regular model diagnostics checking, for Bayesian analyses, it is also necessary to explore the MCMC sampling diagnostics to be sure that the chains are well mixed and have converged on a stable posterior.
There are a wide variety of tests that range from the big picture, overall chain characteristics to the very specific detailed tests that allow the experienced modeller to drill down to the very fine details of the chain behaviour. Furthermore, there are a multitude of packages and approaches for exploring these diagnostics.
The bayesplot package offers a range of MCMC diagnostics
as well as Posterior Probability Checks (PPC), all of which have a
convenient plot() interface. Lets start with the MCMC
diagnostics.
available_mcmc()
## bayesplot MCMC module:
## mcmc_acf
## mcmc_acf_bar
## mcmc_areas
## mcmc_areas_data
## mcmc_areas_ridges
## mcmc_areas_ridges_data
## mcmc_combo
## mcmc_dens
## mcmc_dens_chains
## mcmc_dens_chains_data
## mcmc_dens_overlay
## mcmc_hex
## mcmc_hist
## mcmc_hist_by_chain
## mcmc_intervals
## mcmc_intervals_data
## mcmc_neff
## mcmc_neff_data
## mcmc_neff_hist
## mcmc_nuts_acceptance
## mcmc_nuts_divergence
## mcmc_nuts_energy
## mcmc_nuts_stepsize
## mcmc_nuts_treedepth
## mcmc_pairs
## mcmc_parcoord
## mcmc_parcoord_data
## mcmc_rank_hist
## mcmc_rank_overlay
## mcmc_recover_hist
## mcmc_recover_intervals
## mcmc_recover_scatter
## mcmc_rhat
## mcmc_rhat_data
## mcmc_rhat_hist
## mcmc_scatter
## mcmc_trace
## mcmc_trace_data
## mcmc_trace_highlight
## mcmc_violin
Of these, we will focus on:
plot(quinn.rstanarm3, plotfun = "mcmc_trace")
The chains appear well mixed and very similar
plot(quinn.rstanarm3, "acf_bar")
There is no evidence of auto-correlation in the MCMC samples
plot(quinn.rstanarm3, "rhat_hist")
All Rhat values are below 1.05, suggesting the chains have converged.
neff (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
plot(quinn.rstanarm3, "neff_hist")
Ratios all very high.
plot(quinn.rstanarm3, "combo")
plot(quinn.rstanarm3, "violin")
The rstan package offers a range of MCMC diagnostics.
Lets start with the MCMC diagnostics.
Of these, we will focus on:
stan_trace(quinn.rstanarm3)
The chains appear well mixed and very similar
stan_ac(quinn.rstanarm3)
There is no evidence of auto-correlation in the MCMC samples
stan_rhat(quinn.rstanarm3)
All Rhat values are below 1.05, suggesting the chains have converged.
stan_ess (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
stan_ess(quinn.rstanarm3)
Ratios all very high.
stan_dens(quinn.rstanarm3, separate_chains = TRUE)
The ggmean package also has a set of MCMC diagnostic
functions. Lets start with the MCMC diagnostics.
Of these, we will focus on:
quinn.ggs <- ggs(quinn.rstanarm3, burnin = FALSE, inc_warmup = FALSE)
ggs_traceplot(quinn.ggs)
The chains appear well mixed and very similar
ggs_autocorrelation(quinn.ggs)
There is no evidence of auto-correlation in the MCMC samples
ggs_Rhat(quinn.ggs)
All Rhat values are below 1.05, suggesting the chains have converged.
stan_ess (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
ggs_effective(quinn.ggs)
Ratios all very high.
ggs_crosscorrelation(quinn.ggs)
ggs_grb(quinn.ggs)
The brms package offers a range of MCMC diagnostics.
Lets start with the MCMC diagnostics.
Of these, we will focus on:
quinn.brmP$fit %>% stan_trace()
quinn.brmP$fit %>% stan_trace(inc_warmup = TRUE)
The chains appear well mixed and very similar
quinn.brmP$fit %>% stan_ac()
There is no evidence of auto-correlation in the MCMC samples
quinn.brmP$fit %>% stan_rhat()
All Rhat values are below 1.05, suggesting the chains have converged.
stan_ess (number of effective samples): the ratio of the number of effective samples (those not rejected by the sampler) to the number of samples provides an indication of the effectiveness (and efficiency) of the MCMC sampler. Ratios that are less than 0.5 for a parameter suggest that the sampler spent considerable time in difficult areas of the sampling domain and rejected more than half of the samples (replacing them with the previous effective sample).
If the ratios are low, tightening the priors may help.
quinn.brmP$fit %>% stan_ess()
Ratios all very high.
quinn.brmP$fit %>% stan_dens(separate_chains = TRUE)
Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model.
available_ppc()
## bayesplot PPC module:
## ppc_bars
## ppc_bars_grouped
## ppc_boxplot
## ppc_data
## ppc_dens
## ppc_dens_overlay
## ppc_dens_overlay_grouped
## ppc_ecdf_overlay
## ppc_ecdf_overlay_grouped
## ppc_error_binned
## ppc_error_hist
## ppc_error_hist_grouped
## ppc_error_scatter
## ppc_error_scatter_avg
## ppc_error_scatter_avg_vs_x
## ppc_freqpoly
## ppc_freqpoly_grouped
## ppc_hist
## ppc_intervals
## ppc_intervals_data
## ppc_intervals_grouped
## ppc_km_overlay
## ppc_loo_intervals
## ppc_loo_pit
## ppc_loo_pit_data
## ppc_loo_pit_overlay
## ppc_loo_pit_qq
## ppc_loo_ribbon
## ppc_ribbon
## ppc_ribbon_data
## ppc_ribbon_grouped
## ppc_rootogram
## ppc_scatter
## ppc_scatter_avg
## ppc_scatter_avg_grouped
## ppc_stat
## ppc_stat_2d
## ppc_stat_freqpoly_grouped
## ppc_stat_grouped
## ppc_violin_grouped
pp_check(quinn.rstanarm3, plotfun = "dens_overlay")
The model draws are broadly similar to the observed data.
pp_check(quinn.rstanarm3, plotfun = "error_scatter_avg")
The predictive error seems to be related to the predictor - the model performs poorest at higher recruitments
pp_check(quinn.rstanarm3, x = as.numeric(quinn$SEASON), plotfun = "error_scatter_avg_vs_x")
pp_check(quinn.rstanarm3, x = as.numeric(quinn$DENSITY), plotfun = "error_scatter_avg_vs_x")
pp_check(quinn.rstanarm3, x = as.numeric(quinn$SEASON), plotfun = "intervals")
The modelled predictions seem to do a reasonable job of representing the observations.
The shinystan package allows the full suite of MCMC
diagnostics and posterior predictive checks to be accessed via a web
interface.
# library(shinystan)
# launch_shinystan(quinn.rstanarm3)
DHARMa residuals provide very useful diagnostics. Unfortunately, we
cannot directly use the simulateResiduals() function to
generate the simulated residuals. However, if we are willing to
calculate some of the components yourself, we can still obtain the
simulated residuals from the fitted stan model.
We need to supply:
preds <- posterior_predict(quinn.rstanarm3, nsamples = 250, summary = FALSE)
quinn.resids <- createDHARMa(
simulatedResponse = t(preds),
observedResponse = quinn$RECRUITS,
fittedPredictedResponse = apply(preds, 2, median),
integerResponse = TRUE
)
plot(quinn.resids)
Conclusions:
Post predictive checks provide additional diagnostics about the fit of the model. Specifically, they provide a comparison between predictions drawn from the model and the observed data used to train the model.
available_ppc()
## bayesplot PPC module:
## ppc_bars
## ppc_bars_grouped
## ppc_boxplot
## ppc_data
## ppc_dens
## ppc_dens_overlay
## ppc_dens_overlay_grouped
## ppc_ecdf_overlay
## ppc_ecdf_overlay_grouped
## ppc_error_binned
## ppc_error_hist
## ppc_error_hist_grouped
## ppc_error_scatter
## ppc_error_scatter_avg
## ppc_error_scatter_avg_vs_x
## ppc_freqpoly
## ppc_freqpoly_grouped
## ppc_hist
## ppc_intervals
## ppc_intervals_data
## ppc_intervals_grouped
## ppc_km_overlay
## ppc_loo_intervals
## ppc_loo_pit
## ppc_loo_pit_data
## ppc_loo_pit_overlay
## ppc_loo_pit_qq
## ppc_loo_ribbon
## ppc_ribbon
## ppc_ribbon_data
## ppc_ribbon_grouped
## ppc_rootogram
## ppc_scatter
## ppc_scatter_avg
## ppc_scatter_avg_grouped
## ppc_stat
## ppc_stat_2d
## ppc_stat_freqpoly_grouped
## ppc_stat_grouped
## ppc_violin_grouped
quinn.brmP %>% pp_check(type = "dens_overlay")
The model draws appear to represent the shape of the observed data reasonably well
quinn.brmP %>% pp_check(type = "error_scatter_avg")
The predictive error seems to be related to the predictor - the model performs poorest at higher recruitments.
quinn.brmP %>% pp_check(type = "intervals")
## quinn.brmP %>% pp_check(group='DENSITY', type='intervals')
Whilst the modelled predictions do a reasonable job of representing the observed data, the observed data do appear to be more varied than the model is representing.
The shinystan package allows the full suite of MCMC
diagnostics and posterior predictive checks to be accessed via a web
interface.
# library(shinystan)
# launch_shinystan(quinn.brmP)
DHARMa residuals provide very useful diagnostics. Unfortunately, we
cannot directly use the simulateResiduals() function to
generate the simulated residuals. However, if we are willing to
calculate some of the components yourself, we can still obtain the
simulated residuals from the fitted stan model.
We need to supply:
preds <- quinn.brmP %>% posterior_predict(nsamples = 250, summary = FALSE)
day.resids <- createDHARMa(
simulatedResponse = t(preds),
observedResponse = quinn$RECRUITS,
fittedPredictedResponse = apply(preds, 2, median),
integerResponse = TRUE
)
day.resids %>% plot()
quinn.resids %>% testDispersion()
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## dispersion = 2.6512, p-value < 2.2e-16
## alternative hypothesis: two.sided
quinn.resids %>% testZeroInflation()
##
## DHARMa zero-inflation test via comparison to expected zeros with
## simulation under H0 = fitted model
##
## data: simulationOutput
## ratioObsSim = 5.9211, p-value = 0.1056
## alternative hypothesis: two.sided
Conclusions:
quinn.rstanarmNB <- stan_glm(RECRUITS ~ SEASON * DENSITY,
data = quinn,
family = neg_binomial_2(link = "log"),
prior_intercept = normal(2.3, 5, autoscale = FALSE),
prior = normal(0, 2, autoscale = FALSE),
prior_aux = rstanarm::exponential(rate = 1, autoscale = FALSE),
prior_PD = FALSE,
refresh = 0,
chains = 3, iter = 5000, thin = 5, warmup = 2000
)
posterior_vs_prior(quinn.rstanarmNB,
color_by = "vs", group_by = TRUE,
facet_args = list(scales = "free_y")
)
quinn.rstanarmNB %>%
ggpredict(~ SEASON + DENSITY) %>%
plot(add.data = TRUE)
quinn.rstanarmNB %>%
ggemmeans(~ SEASON + DENSITY, transform = TRUE) %>%
plot(add.data = TRUE)
quinn.rstanarmNB %>% plot("mcmc_trace")
quinn.rstanarmNB %>% plot("mcmc_acf_bar")
quinn.rstanarmNB %>% plot("mcmc_rhat_hist")
quinn.rstanarmNB %>% plot("mcmc_neff_hist")
preds <- posterior_predict(quinn.rstanarmNB, nsamples = 250, summary = FALSE)
quinn.resids <- createDHARMa(
simulatedResponse = t(preds),
observedResponse = quinn$RECRUITS,
fittedPredictedResponse = apply(preds, 2, median),
integerResponse = TRUE
)
plot(quinn.resids)
(loo.P <- loo(quinn.rstanarmP))
##
## Computed from 1800 by 42 log-likelihood matrix
##
## Estimate SE
## elpd_loo -172.2 16.2
## p_loo 25.6 6.9
## looic 344.5 32.5
## ------
## Monte Carlo SE of elpd_loo is NA.
##
## Pareto k diagnostic values:
## Count Pct. Min. n_eff
## (-Inf, 0.5] (good) 35 83.3% 259
## (0.5, 0.7] (ok) 5 11.9% 150
## (0.7, 1] (bad) 1 2.4% 68
## (1, Inf) (very bad) 1 2.4% 2
## See help('pareto-k-diagnostic') for details.
(loo.NB <- loo(quinn.rstanarmNB))
##
## Computed from 1800 by 42 log-likelihood matrix
##
## Estimate SE
## elpd_loo -150.0 5.8
## p_loo 8.9 3.2
## looic 300.0 11.5
## ------
## Monte Carlo SE of elpd_loo is NA.
##
## Pareto k diagnostic values:
## Count Pct. Min. n_eff
## (-Inf, 0.5] (good) 41 97.6% 536
## (0.5, 0.7] (ok) 0 0.0% <NA>
## (0.7, 1] (bad) 0 0.0% <NA>
## (1, Inf) (very bad) 1 2.4% 9
## See help('pareto-k-diagnostic') for details.
loo_compare(loo.P, loo.NB)
## elpd_diff se_diff
## quinn.rstanarmNB 0.0 0.0
## quinn.rstanarmP -22.2 12.3
quinn.form <- bf(RECRUITS ~ SEASON * DENSITY, family = negbinomial(link = "log"))
get_prior(quinn.form, data = quinn)
priors <- prior(normal(2.3, 2), class = "Intercept") +
prior(normal(0, 1), class = "b") +
prior(gamma(0.01, 0.01), class = "shape")
quinn.brmsNB <- brm(quinn.form,
data = quinn,
prior = priors,
refresh = 0,
chains = 3,
iter = 5000,
thin = 5,
warmup = 2000
)
preds <- posterior_predict(quinn.brmsNB, nsamples = 250, summary = FALSE)
quinn.resids <- createDHARMa(
simulatedResponse = t(preds),
observedResponse = quinn$RECRUITS,
fittedPredictedResponse = apply(preds, 2, median),
integerResponse = TRUE
)
plot(quinn.resids)
quinn.rstanarmNB %>%
ggpredict(~ SEASON + DENSITY) %>%
plot(add.data = TRUE)
quinn.rstanarmNB %>%
ggemmeans(~ SEASON | DENSITY, type = "fixed", transform = TRUE) %>%
plot(add.data = TRUE)
quinn.rstanarmNB %>%
fitted_draws(newdata = quinn) %>%
median_hdci() %>%
ggplot(aes(x = SEASON, colour = DENSITY, y = .value)) +
geom_pointrange(aes(ymin = .lower, ymax = .upper), position = position_dodge(width = 0.2)) +
geom_line(position = position_dodge(width = 0.2)) +
geom_point(data = quinn, aes(y = RECRUITS, x = SEASON, colour = DENSITY), position = position_dodge(width = 0.2))
quinn.brmsNB %>%
conditional_effects("SEASON:DENSITY") %>%
plot(points = TRUE)
quinn.brmsNB %>%
ggpredict(~ SEASON + DENSITY) %>%
plot(add.data = TRUE)
quinn.brmsNB %>%
ggemmeans(~ SEASON | DENSITY) %>%
plot(add.data = TRUE)
quinn.brmsNB %>%
fitted_draws(newdata = quinn) %>%
median_hdci() %>%
ggplot(aes(x = SEASON, colour = DENSITY, y = .value)) +
geom_pointrange(aes(ymin = .lower, ymax = .upper), position = position_dodge(width = 0.2)) +
geom_line(position = position_dodge(width = 0.2)) +
geom_point(data = quinn, aes(y = RECRUITS, x = SEASON, colour = DENSITY), position = position_dodge(width = 0.2))
rstanarm captures the MCMC samples from
stan within the returned list. There are numerous ways to
retrieve and summarise these samples. The first three provide convenient
numeric summaries from which you can draw conclusions, the last four
provide ways of obtaining the full posteriors.
The summary() method generates simple summaries (mean,
standard deviation as well as 10, 50 and 90 percentiles).
quinn.rstanarmNB %>% summary()
##
## Model Info:
## function: stan_glm
## family: neg_binomial_2 [log]
## formula: RECRUITS ~ SEASON * DENSITY
## algorithm: sampling
## sample: 1800 (posterior sample size)
## priors: see help('prior_summary')
## observations: 42
## predictors: 8
##
## Estimates:
## mean sd 10% 50% 90%
## (Intercept) 2.4 0.2 2.1 2.4 2.7
## SEASONSummer 1.5 0.3 1.1 1.5 1.9
## SEASONAutumn 0.6 0.3 0.2 0.6 1.0
## SEASONWinter -0.6 0.4 -1.1 -0.6 -0.2
## DENSITYLow 0.0 0.3 -0.4 0.0 0.5
## SEASONSummer:DENSITYLow -0.8 0.5 -1.4 -0.8 -0.2
## SEASONAutumn:DENSITYLow -0.1 0.5 -0.7 -0.1 0.5
## SEASONWinter:DENSITYLow -0.8 0.6 -1.6 -0.8 0.0
## reciprocal_dispersion 4.0 1.3 2.5 3.9 5.6
##
## Fit Diagnostics:
## mean sd 10% 50% 90%
## mean_PPD 19.0 2.9 15.6 18.7 22.7
##
## The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
##
## MCMC diagnostics
## mcse Rhat n_eff
## (Intercept) 0.0 1.0 1743
## SEASONSummer 0.0 1.0 1771
## SEASONAutumn 0.0 1.0 1819
## SEASONWinter 0.0 1.0 1798
## DENSITYLow 0.0 1.0 1513
## SEASONSummer:DENSITYLow 0.0 1.0 1681
## SEASONAutumn:DENSITYLow 0.0 1.0 1710
## SEASONWinter:DENSITYLow 0.0 1.0 1669
## reciprocal_dispersion 0.0 1.0 1963
## mean_PPD 0.1 1.0 1806
## log-posterior 0.1 1.0 1717
##
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
Conclusions:
tidyMCMC(quinn.rstanarmNB$stanfit, estimate.method = "median", conf.int = TRUE, conf.method = "HPDinterval", rhat = TRUE, ess = TRUE)
Conclusions:
See above
quinn.rstanarmNB$stanfit %>% as_draws_df()
quinn.rstanarmNB$stanfit %>%
as_draws_df() %>%
summarise_draws(
"median",
~ HDInterval::hdi(.x),
"rhat",
"ess_bulk"
)
Due to the presence of a log transform in the predictor, it is better to use the regex version.
quinn.rstanarmNB %>% get_variables()
## [1] "(Intercept)" "SEASONSummer"
## [3] "SEASONAutumn" "SEASONWinter"
## [5] "DENSITYLow" "SEASONSummer:DENSITYLow"
## [7] "SEASONAutumn:DENSITYLow" "SEASONWinter:DENSITYLow"
## [9] "reciprocal_dispersion" "accept_stat__"
## [11] "stepsize__" "treedepth__"
## [13] "n_leapfrog__" "divergent__"
## [15] "energy__"
quinn.draw <- quinn.rstanarmNB %>% gather_draws(`.Intercept.*|.*SEASON.*|.*DENSITY.*`, regex = TRUE)
quinn.draw
We can then summarise this
quinn.draw %>% median_hdci()
quinn.rstanarmNB %>%
gather_draws(`.Intercept.*|.*SEASON.*|.*DENSITY.*`, regex = TRUE) %>%
ggplot() +
stat_halfeye(aes(x = .value, y = .variable)) +
facet_wrap(~.variable, scales = "free")
quinn.rstanarmNB %>%
gather_draws(`.Intercept.*|.*SEASON.*|.*DENSITY.*`, regex = TRUE) %>%
ggplot() +
geom_vline(xintercept = 0, linetype = "dashed") +
stat_halfeye(aes(x = .value, y = .variable)) +
theme_classic()
We could alternatively express the parameters on the response scale.
quinn.rstanarmNB %>%
gather_draws(`.Intercept.*|.*SEASON.*|.*DENSITY.*`, regex = TRUE) %>%
group_by(.variable) %>%
mutate(.value = exp(.value)) %>%
median_hdci()
quinn.rstanarmNB %>%
gather_draws(`.Intercept.*|.*SEASON.*|.*DENSITY.*`, regex = TRUE) %>%
mutate(.value = exp(.value)) %>%
ggplot() +
geom_vline(xintercept = 1, linetype = "dashed") +
stat_halfeye(aes(x = .value, y = .variable)) +
scale_x_continuous("", trans = scales::log2_trans(), breaks = unique(as.vector(2^(0:4 %o% c(-1, 1))))) +
theme_classic()
quinn.rstanarmNB %>% plot(plotfun = "mcmc_intervals")
This is purely a graphical depiction on the posteriors.
quinn.rstanarmNB %>% tidy_draws()
quinn.rstanarmNB %>% spread_draws(`.Intercept.*|.*SEASON.*|.*DENSITY.*`, regex = TRUE)
quinn.rstanarmNB %>%
posterior_samples() %>%
as_tibble()
Unfortunately, \(R^2\) calculations
for models other than Gaussian and Binomial have not yet been
implemented for rstanarm models yet.
# quinn.rstanarmNB %>% bayes_R2() %>% median_hdci
brms captures the MCMC samples from stan
within the returned list. There are numerous ways to retrieve and
summarise these samples. The first three provide convenient numeric
summaries from which you can draw conclusions, the last four provide
ways of obtaining the full posteriors.
The summary() method generates simple summaries (mean,
standard deviation as well as 10, 50 and 90 percentiles).
quinn.brmsNB %>% summary()
## Family: negbinomial
## Links: mu = log; shape = identity
## Formula: RECRUITS ~ SEASON * DENSITY
## Data: quinn (Number of observations: 42)
## Draws: 3 chains, each with iter = 1000; warmup = 400; thin = 5;
## total post-warmup draws = 360
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept 2.42 0.19 2.05 2.82 1.00 1796
## SEASONSummer 1.41 0.25 0.91 1.88 1.00 1519
## SEASONAutumn 0.55 0.26 0.04 1.05 1.00 1831
## SEASONWinter -0.68 0.29 -1.23 -0.10 1.00 1624
## DENSITYLow -0.06 0.26 -0.57 0.44 1.00 1738
## SEASONSummer:DENSITYLow -0.65 0.35 -1.33 0.05 1.00 1633
## SEASONAutumn:DENSITYLow 0.01 0.38 -0.69 0.75 1.00 1773
## SEASONWinter:DENSITYLow -0.62 0.49 -1.59 0.35 1.00 1828
## Tail_ESS
## Intercept 1709
## SEASONSummer 1646
## SEASONAutumn 1787
## SEASONWinter 1827
## DENSITYLow 1728
## SEASONSummer:DENSITYLow 1711
## SEASONAutumn:DENSITYLow 1613
## SEASONWinter:DENSITYLow 1500
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## shape 6.67 2.96 2.88 13.94 1.00 1793 1499
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Conclusions:
quinn.brmsNB$fit %>% tidyMCMC(estimate.method = "median", conf.int = TRUE, conf.method = "HPDinterval", rhat = TRUE, ess = TRUE)
Conclusions:
see above
quinn.brmsNB %>% as_draws_df()
quinn.brmsNB %>%
as_draws_df() %>%
summarise_draws(
"median",
~ HDInterval::hdi(.x),
"rhat",
"ess_bulk"
)
Due to the presence of a log transform in the predictor, it is better to use the regex version.
quinn.brmsNB %>% get_variables()
## [1] "b_Intercept" "b_SEASONSummer"
## [3] "b_SEASONAutumn" "b_SEASONWinter"
## [5] "b_DENSITYLow" "b_SEASONSummer:DENSITYLow"
## [7] "b_SEASONAutumn:DENSITYLow" "b_SEASONWinter:DENSITYLow"
## [9] "shape" "lp__"
## [11] "accept_stat__" "stepsize__"
## [13] "treedepth__" "n_leapfrog__"
## [15] "divergent__" "energy__"
quinn.draw <- quinn.brmsNB %>%
gather_draws(`.Intercept.*|.*SEASON.*|.*DENSITY.*`, regex = TRUE)
quinn.draw
We can then summarise this
quinn.draw %>% median_hdci()
quinn.brmsNB %>%
gather_draws(`.Intercept.*|.*SEASON.*|.*DENSITY.*`, regex = TRUE) %>%
ggplot() +
geom_vline(xintercept = 0, linetype = "dashed") +
stat_slab(aes(
x = .value, y = .variable,
fill = stat(ggdist::cut_cdf_qi(cdf,
.width = c(0.5, 0.8, 0.95),
labels = scales::percent_format()
))
), color = "black") +
scale_fill_brewer("Interval", direction = -1, na.translate = FALSE)
quinn.brmsNB %>%
gather_draws(`.Intercept.*|.*SEASON.*|.*DENSITY.*`, regex = TRUE) %>%
ggplot() +
geom_vline(xintercept = 0, linetype = "dashed") +
stat_halfeye(aes(x = .value, y = .variable)) +
theme_classic()
We could alternatively express the parameters on the response scale.
quinn.brmsNB %>%
gather_draws(`.Intercept.*|.*SEASON.*|.*DENSITY.*`, regex = TRUE) %>%
group_by(.variable) %>%
mutate(.value = exp(.value)) %>%
median_hdci()
quinn.brmsNB %>%
gather_draws(`.Intercept.*|.*SEASON.*|.*DENSITY.*`, regex = TRUE) %>%
mutate(.value = exp(.value)) %>%
ggplot() +
geom_vline(xintercept = 1, linetype = "dashed") +
stat_slab(aes(
x = .value, y = .variable,
fill = stat(ggdist::cut_cdf_qi(cdf,
.width = c(0.5, 0.8, 0.95),
labels = scales::percent_format()
))
), color = "black") +
scale_x_continuous("", trans = scales::log2_trans(), breaks = unique(as.vector(2^(0:4 %o% c(-1, 1))))) +
scale_fill_brewer("Interval", direction = -1, na.translate = FALSE) +
theme_classic()
Conclusions:
quinn.brmsNB$fit %>% plot(type = "intervals")
This is purely a graphical depiction on the posteriors.
quinn.brmsNB %>% tidy_draws()
quinn.brmsNB %>% spread_draws(`.Intercept.*|.*SEASON.*|.*DENSITY.*`, regex = TRUE)
quinn.brmsNB %>%
posterior_samples() %>%
as_tibble()
quinn.brmsNB %>%
bayes_R2(summary = FALSE) %>%
median_hdci()
In order to tease apart the significant interaction(s), it might be useful to explore the effect of Density separately within each Season.
## fold scale
quinn.rstanarmNB %>%
emmeans(~ DENSITY | SEASON, type = "response") %>%
pairs()
## SEASON = Spring:
## contrast ratio lower.HPD upper.HPD
## High / Low 0.961 0.351 1.72
##
## SEASON = Summer:
## contrast ratio lower.HPD upper.HPD
## High / Low 2.149 1.018 3.75
##
## SEASON = Autumn:
## contrast ratio lower.HPD upper.HPD
## High / Low 1.066 0.478 1.97
##
## SEASON = Winter:
## contrast ratio lower.HPD upper.HPD
## High / Low 2.083 0.457 5.22
##
## Point estimate displayed: median
## Results are back-transformed from the log scale
## HPD interval probability: 0.95
## absolute response scale
quinn.rstanarmNB %>%
emmeans(~ DENSITY | SEASON, type = "link") %>%
regrid() %>%
pairs()
## SEASON = Spring:
## contrast estimate lower.HPD upper.HPD
## High - Low -0.435 -8.51 8.17
##
## SEASON = Summer:
## contrast estimate lower.HPD upper.HPD
## High - Low 25.087 3.80 52.01
##
## SEASON = Autumn:
## contrast estimate lower.HPD upper.HPD
## High - Low 1.159 -13.24 15.28
##
## SEASON = Winter:
## contrast estimate lower.HPD upper.HPD
## High - Low 2.844 -1.82 7.37
##
## Point estimate displayed: median
## HPD interval probability: 0.95
quinn.em <- quinn.rstanarmNB %>%
emmeans(~ DENSITY | SEASON, type = "link") %>%
pairs() %>%
gather_emmeans_draws() %>%
mutate(Fit = exp(.value))
head(quinn.em)
g2 <- quinn.em %>%
group_by(contrast, SEASON) %>%
median_hdci() %>%
ggplot() +
geom_vline(xintercept = 1, linetype = "dashed") +
geom_pointrange(aes(x = Fit, y = SEASON, xmin = Fit.lower, xmax = Fit.upper)) +
scale_x_continuous("Effect size (High/Low)", trans = scales::log2_trans(), breaks = unique(as.vector(2^(0:4 %o% c(-1, 1))))) +
theme_classic()
g2
ggplot(quinn.em, aes(x = Fit)) +
geom_histogram() +
geom_vline(xintercept = 1, linetype = "dashed") +
scale_x_continuous("Effect size (High/Low)", trans = scales::log2_trans(), breaks = unique(as.vector(2^(0:4 %o% c(-1, 1))))) +
facet_wrap(SEASON ~ contrast, scales = "free")
quinn.em %>%
group_by(contrast, SEASON) %>%
median_hdci(Fit)
# Probability of effect
quinn.em %>%
group_by(contrast, SEASON) %>%
summarize(P = sum(Fit > 1) / n())
## Probability of effect greater than 10%
quinn.em %>%
group_by(contrast, SEASON) %>%
summarize(P = sum(Fit > 1.1) / n())
newdata <- with(quinn, expand.grid(
SEASON = levels(SEASON),
DENSITY = levels(DENSITY)
))
Xmat <- model.matrix(~ SEASON * DENSITY, data = newdata)
as.matrix(quinn.rstanarmNB) %>% head()
## parameters
## iterations (Intercept) SEASONSummer SEASONAutumn SEASONWinter DENSITYLow
## [1,] 2.525182 1.602973 0.4956754 -0.5901065 -0.3250596
## [2,] 2.200050 1.329697 0.5317508 -0.3082269 0.4194857
## [3,] 2.649075 1.587816 -0.1158720 -1.0883121 -0.3539696
## [4,] 2.241551 1.626216 0.5697111 -0.4027191 0.3299555
## [5,] 2.495713 1.450508 0.2890408 -1.1310840 -0.4473102
## [6,] 2.260671 1.532025 0.6319743 -0.3475176 0.6705982
## parameters
## iterations SEASONSummer:DENSITYLow SEASONAutumn:DENSITYLow
## [1,] -0.4988975 0.06770374
## [2,] -0.7440424 -0.18086443
## [3,] -0.5198897 0.47621283
## [4,] -1.4200070 -0.85024943
## [5,] -0.6847727 0.36802860
## [6,] -1.5506348 -0.72095414
## parameters
## iterations SEASONWinter:DENSITYLow reciprocal_dispersion
## [1,] -0.5086493 6.389064
## [2,] -1.1662444 5.377954
## [3,] -0.8893660 4.288841
## [4,] -1.7308601 3.539346
## [5,] 0.3053695 3.211375
## [6,] -0.6954901 5.365237
## coefs <- as.matrix(quinn.rstanarmNB)
coefs <- as.matrix(as.data.frame(quinn.rstanarmNB) %>%
dplyr:::select(-reciprocal_dispersion)) %>%
as.matrix()
fit <- exp(coefs %*% t(Xmat))
newdata <- newdata %>%
cbind(tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval"))
head(newdata)
ggplot(newdata, aes(y = estimate, x = SEASON, fill = DENSITY)) +
geom_blank() +
geom_line(aes(x = as.numeric(SEASON), ymin = conf.low, ymax = conf.high, linetype = DENSITY),
position = position_dodge(0.2)
) +
geom_pointrange(aes(ymin = conf.low, ymax = conf.high),
shape = 21,
position = position_dodge(0.2)
)
# Compare high and low in each season
# via contrasts
newdata <- with(quinn, expand.grid(
SEASON = levels(SEASON),
DENSITY = levels(DENSITY)
))
## factor differences
Xmat <- model.matrix(~ SEASON * DENSITY, data = newdata)
Xmat.high <- Xmat[newdata$DENSITY == "High", ]
Xmat.low <- Xmat[newdata$DENSITY == "Low", ]
Xmat.density <- Xmat.high - Xmat.low
rownames(Xmat.density) <- levels(quinn$SEASON)
coefs <- as.matrix(as.data.frame(quinn.rstanarmNB) %>% dplyr:::select(-reciprocal_dispersion))
fit <- exp(coefs %*% t(Xmat.density))
tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")
## or absolute
fit.high <- coefs %*% t(Xmat.high)
fit.low <- coefs %*% t(Xmat.low)
fit <- exp(fit.high) - exp(fit.low)
# fit = exp(fit.high - fit.low)
tidyMCMC(fit, conf.int = TRUE, conf.method = "HPDinterval")
quinn.brmsNB %>%
emmeans(~ DENSITY | SEASON, type = "response") %>%
pairs()
## SEASON = Spring:
## contrast ratio lower.HPD upper.HPD
## High / Low 1.05 0.601 1.69
##
## SEASON = Summer:
## contrast ratio lower.HPD upper.HPD
## High / Low 2.04 1.116 3.11
##
## SEASON = Autumn:
## contrast ratio lower.HPD upper.HPD
## High / Low 1.04 0.549 1.81
##
## SEASON = Winter:
## contrast ratio lower.HPD upper.HPD
## High / Low 1.97 0.657 4.16
##
## Point estimate displayed: median
## Results are back-transformed from the log scale
## HPD interval probability: 0.95
## absolute response scale
quinn.brmsNB %>%
emmeans(~ DENSITY | SEASON, type = "link") %>%
regrid() %>%
pairs()
## SEASON = Spring:
## contrast estimate lower.HPD upper.HPD
## High - Low 0.555 -4.842 6.69
##
## SEASON = Summer:
## contrast estimate lower.HPD upper.HPD
## High - Low 23.301 7.182 42.71
##
## SEASON = Autumn:
## contrast estimate lower.HPD upper.HPD
## High - Low 0.832 -10.789 12.41
##
## SEASON = Winter:
## contrast estimate lower.HPD upper.HPD
## High - Low 2.779 -0.679 6.33
##
## Point estimate displayed: median
## HPD interval probability: 0.95
quinn.em <- quinn.brmsNB %>%
emmeans(~ DENSITY | SEASON, type = "link") %>%
pairs() %>%
gather_emmeans_draws() %>%
mutate(Fit = exp(.value))
head(quinn.em)
g2 <- quinn.em %>%
group_by(contrast, SEASON) %>%
median_hdci() %>%
ggplot() +
geom_vline(xintercept = 1, linetype = "dashed") +
geom_pointrange(aes(x = Fit, y = SEASON, xmin = Fit.lower, xmax = Fit.upper)) +
scale_x_continuous("Effect size (High/Low)", trans = scales::log2_trans(), breaks = unique(as.vector(2^(0:4 %o% c(-1, 1))))) +
theme_classic()
g2
ggplot(quinn.em, aes(x = Fit)) +
geom_histogram() +
geom_vline(xintercept = 1, linetype = "dashed") +
scale_x_continuous("Effect size (High/Low)", trans = scales::log2_trans(), breaks = unique(as.vector(2^(0:4 %o% c(-1, 1))))) +
facet_wrap(SEASON ~ contrast, scales = "free")
quinn.em %>%
group_by(contrast, SEASON) %>%
median_hdci(Fit)
# Probability of effect
quinn.em %>%
group_by(contrast, SEASON) %>%
summarize(P = sum(Fit > 1) / n())
## Probability of effect greater than 10%
quinn.em %>%
group_by(contrast, SEASON) %>%
summarize(P = sum(Fit > 1.1) / n())
newdata <- quinn.rstanarmNB %>%
emmeans(~ SEASON | DENSITY, type = "response") %>%
as.data.frame()
head(newdata)
g1 <- ggplot(newdata, aes(y = prob, x = SEASON, color = DENSITY)) +
geom_pointrange(aes(ymin = lower.HPD, ymax = upper.HPD),
position = position_dodge(width = 0.2)
) +
theme_classic()
g1 + g2
newdata <- quinn.brmsNB %>%
emmeans(~ SEASON | DENSITY, type = "response") %>%
as.data.frame()
head(newdata)
g1 <- ggplot(newdata, aes(y = prob, x = SEASON, color = DENSITY)) +
geom_pointrange(aes(ymin = lower.HPD, ymax = upper.HPD),
position = position_dodge(width = 0.2)
) +
theme_classic()
g1 + g2
quinn <- quinn %>%
group_by(SEASON, DENSITY) %>%
mutate(Obs = factor(1:n()))
quinn.form <- bf(RECRUITS ~ SEASON * DENSITY + (1 | Obs), family = poisson(link = "log"))
get_prior(quinn.form, data = quinn)
quinn.brmsU <- brm(quinn.form,
data = quinn,
refresh = 0,
chains = 3,
iter = 5000,
thin = 5,
warmup = 2000
)
preds <- posterior_predict(quinn.brmsU, nsamples = 250, summary = FALSE)
quinn.resids <- createDHARMa(
simulatedResponse = t(preds),
observedResponse = quinn$RECRUITS,
fittedPredictedResponse = apply(preds, 2, median),
integerResponse = TRUE
)
plot(quinn.resids)
newdata <- emmeans(quinn.brmsU, ~ SEASON | DENSITY, type = "response") %>% as.data.frame()
newdata
ggplot(newdata, aes(y = rate, x = SEASON, color = DENSITY)) +
geom_pointrange(aes(ymin = lower.HPD, ymax = upper.HPD),
position = position_dodge(width = 0.2)
)